Clustering Scatter Plots Using Data Depth Measures

نویسندگان

  • Zhanpan Zhang
  • Xinping Cui
  • Daniel R. Jeske
  • Xiaoxiao Li
  • Jonathan Braun
  • James Borneman
چکیده

Clustering is rapidly becoming a powerful data mining technique, and has been broadly applied to many domains such as bioinformatics and text mining. However, the existing methods can only deal with a data matrix of scalars. In this paper, we introduce a hierarchical clustering procedure that can handle a data matrix of scatter plots. To more accurately reflect the nature of data, we introduce a dissimilarity statistic based on "data depth" to measure the discrepancy between two bivariate distributions without oversimplifying the nature of the underlying pattern. We then combine hypothesis testing with hierarchical clustering to simultaneously cluster the rows and columns of the data matrix of scatter plots. We also propose novel painting metrics and construct heat maps to allow visualization of the clusters. We demonstrate the utility and power of our new clustering method through simulation studies and application to a microbe-host-interaction study.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interactive Graphic Clustering Using the Promenade System

This approach to clustering stresses the use of multivariate plots that allow the user, for example, to "fly through hyperspace," observing his data. In addition, multivariate histograms, scatter plots, link -node plots, waveform plots, and others can be dynamically controlled from the console to provide a unique visual insight into data structure, which cannot be obtained from conventional off...

متن کامل

Visual analytics of large multidimensional data using variable binned scatter plots

The scatter plot is a well-known method of visualizing pairs of two-dimensional continuous variables. Multidimensional data can be depicted in a scatter plot matrix. They are intuitive and easy-to-use, but often have a high degree of overlap which may occlude a significant portion of data. In this paper, we propose variable binned scatter plots to allow the visualization of large amounts of dat...

متن کامل

Variable binned scatter plots

The scatter plot is a well-known method of visualizing pairs of two continuous variables. Scatter plots are intuitive and easy-to-use, but often have a high degree of overlap which may occlude a significant portion of the data. To analyze a dense non-uniform dataset, a recursive drill-down is required for detailed analysis. In this paper, we propose variable binned scatter plots to allow the vi...

متن کامل

Identifying Locally Interesting Motifs for Exploration of Scatter Plot Matrices

Scatter plots are effective diagrams to visualize distributions, clusters and correlations in two-dimensional data space. For highdimensional data, scatter plot matrices can be formed to show all two-dimensional combinations of dimensions. Several previous approaches for exploration of large scatter plot spaces have focused on ranking and sorting scatter plot matrices based on global patterns. ...

متن کامل

Convex Hull Brushing in Scatter Plots - Multi-dimensional Correlation Analysis

Interactive Visual Analysis has been widely used for the reason that it allows users to investigate highly complex data in coordinated multiple views, showing different perspectives over data. In order to relate data, multiple techniques of brushing have been introduced. This work extends the state of the art by introducing the Convex Hull (CH) Brush, which is a new way of selecting and interpr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره Suppl 5  شماره 

صفحات  -

تاریخ انتشار 2010